AITopics | vanilla gradient descent

Collaborating Authors

vanilla gradient descent

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Beyond NTK with Vanilla Gradient Descent: A Mean-Field Analysis of Neural Networks with Polynomial Width, Samples, and Time

Neural Information Processing SystemsDec-26-2025, 14:13:34 GMT

mean-field analysis, neural network, vanilla gradient descent, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.78)

Add feedback

Colliding with Adversaries at ECML-PKDD 2025 Adversarial Attack Competition 1st Prize Solution

Stefanopoulos, Dimitris, Voskou, Andreas

arXiv.org Artificial IntelligenceOct-21-2025

This report presents the winning solution for Task 1 of Colliding with Adversaries: A Challenge on Robust Learning in High Energy Physics Discovery at ECML-PKDD 2025. The task required designing an adversarial attack against a provided classification model that maximizes misclassification while minimizing perturbations. Our approach employs a multi-round gradient-based strategy that leverages the differentiable structure of the model, augmented with random initialization and sample-mixing techniques to enhance effectiveness. The resulting attack achieved the best results in perturbation size and fooling success rate, securing first place in the competition.

artificial intelligence, colliding, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2510.1644

Country: Europe > Middle East > Cyprus (0.15)

Genre: Research Report (0.40)

Industry:

Information Technology > Security & Privacy (0.64)
Government > Military (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Beyond NTK with Vanilla Gradient Descent: A Mean-Field Analysis of Neural Networks with Polynomial Width, Samples, and Time

Neural Information Processing SystemsJan-19-2025, 19:57:39 GMT

Despite recent theoretical progress on the non-convex optimization of two-layer neural networks, it is still an open question whether gradient descent on neural networks without unnatural modifications can achieve better sample complexity than kernel methods. This paper provides a clean mean-field analysis of projected gradient flow on polynomial-width two-layer neural networks. Different from prior works, our analysis does not require unnatural modifications of the optimization algorithm. We prove that with sample size n O(d {3.1}) where d is the dimension of the inputs, the network trained with projected gradient flow converges in polynomial time to a non-trivial error that is not achievable by kernel methods using n \ll d 4 samples, hence demonstrating a clear separation between unmodified gradient descent and NTK. As a corollary, we show that projected gradient descent with a positive learning rate and a polynomial number of iterations converges to low error with the same sample complexity.

mean-field analysis, neural network, vanilla gradient descent, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Provable Acceleration of Neural Net Training via Polyak's Momentum

Wang, Jun-Kun, Abernethy, Jacob

arXiv.org Machine LearningOct-4-2020

Incorporating a so-called "momentum" dynamic in gradient descent methods is widely used in neural net training as it has been broadly observed that, at least empirically, it often leads to significantly faster convergence. At the same time, there are very few theoretical guarantees in the literature to explain this apparent acceleration effect. In this paper we show that Polyak's momentum, in combination with over-parameterization of the model, helps achieve faster convergence in training a one-layer ReLU network on $n$ examples. We show specifically that gradient descent with Polyak's momentum decreases the initial training error at a rate much faster than that of vanilla gradient descent. We provide a bound for a fixed sample size $n$, and we show that gradient descent with Polyak's momentum converges at an accelerated rate to a small error that is controllable by the number of neurons $m$. Prior work [DZPS19] showed that using vanilla gradient descent, and with a similar method of over-parameterization, the error decays as $(1-\kappa_n)^t$ after $t$ iterations, where $\kappa_n$ is a problem-specific parameter. Our result shows that with the appropriate choice of parameters one has a rate of $(1-\sqrt{\kappa_n})^t$. This work establishes that momentum does indeed speed up neural net training.

gradient descent, momentum, polyak, (15 more...)

arXiv.org Machine Learning

2010.01618

Country:

Europe > Russia (0.04)
Asia > Russia (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Nonconvex Rectangular Matrix Completion via Gradient Descent without $\ell_{2,\infty}$ Regularization

Chen, Ji, Liu, Dekai, Li, Xiaodong

arXiv.org Machine LearningJan-18-2019

The analysis of nonconvex matrix completion has recently attracted much attention in the community of machine learning thanks to its computational convenience. Existing analysis on this problem, however, usually relies on $\ell_{2,\infty}$ projection or regularization that involves unknown model parameters, although they are observed to be unnecessary in numerical simulations, see, e.g. Zheng and Lafferty [2016]. In this paper, we extend the analysis of the vanilla gradient descent for positive semidefinite matrix completion proposed in Ma et al. [2017] to the rectangular case, and more significantly, improve the required sampling complexity from $\widetilde{O}(r^3)$ to $\widetilde{O}(r^2)$. Our technical ideas and contributions are potentially useful in improving the leave-one-out analysis in other related problems.

gradient descent, inequality, matrix completion, (11 more...)

arXiv.org Machine Learning

1901.06116

Country:

North America > United States > California > Yolo County > Davis (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.62)

Add feedback

Neural Network Foundations, Explained: Updating Weights with Gradient Descent & Backpropagation

@machinelearnbotOct-25-2017, 23:30:07 GMT

Recall that in order for a neural networks to learn, weights associated with neuron connections must be updated after forward passes of data through the network. These weights are adjusted to help reconcile the differences between the actual and predicted outcomes for subsequent forward passes. But how, exactly, do the weights get adjusted? Before we get to the actual adjustments, think of what would be needed at each neuron in order to make a meaningful change to a given weight. Since we are talking about the difference between actual and predicted values, the error would be a useful measure here, and so each neuron will require that their respective error be sent backward through the network to them in order to facilitate the update process; hence, backpropagation of error.

artificial intelligence, gradient descent, machine learning, (14 more...)

@machinelearnbot

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.60)

Add feedback

Stochastic Gradient Descent (SGD) with Python - PyImageSearch

#artificialintelligenceDec-23-2016, 17:35:15 GMT

In a "purist" implementation of SGD, your mini-batch size would be set to 1. However, we often uses mini-batches that are 1. Typical values include 32, 64, 128, and 256. To start, using batches 1 helps reduce variance in the parameter update, ultimately leading to a more stable convergence. Secondly, optimized matrix operation libraries are often more efficient when the input matrix size is a power of 2. In general, the mini-batch size is not a hyperparameter that you should worry much about. You basically determine how many training examples will fit on your GPU/main memory and then use the nearest power of 2 as the batch size.

artificial intelligence, machine learning, stochastic gradient descent, (14 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback